Nucleoside deaminases: the key players in base editing toolkit

The development of nucleoside deaminase-containing base editors realized targeted single base change with high efficiency and precision. Such nucleoside deaminases include adenosine and cytidine deaminases, which can catalyze adenosine-to-inosine (A-to-I) and cytidine-to-uridine (C-to-U) conversion respectively. These nucleoside deaminases are under the spotlight because of their vast application potential in gene editing. Recent advances in the engineering of current nucleoside deaminases and the discovery of new nucleoside deaminases greatly broaden the application scope and improve the editing specificity of base editors. In this review, we cover current knowledge about the deaminases used in base editors, including their key structural features, working mechanisms, optimization, and evolution.


INTRODUCTION
Eukaryotic genomes are composed of billions of major nucleobases.While the modifications of these bases can lead to functional diversity, those unexpected base mutations could cause genomic instability (Korf et al. 2019).Therefore, developing precise and efficient tools to achieve base conversion in DNA or RNA molecules has been a long-sought goal (Doudna 2020).Since its advent, the clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein (Cas) systems have been widely applied in gene editing.The canonical CRISPR-Cas9 system contains a Cas9 protein whose DNA-targeting specificity and cutting activity are programmed by a short guide RNA (Doudna and Charpentier 2014;Jinek et al. 2012;Mali et al. 2013), establishing a platform for more versatile gene editing.Later on, base editors (BEs), a more advanced gene editing toolkit, were developed to achieve precise and efficient editing at the single-base level, without triggering double-strand breaks (DSBs) or requiring donor DNA templates (Gaudelli et al. 2017;Gehrke et al. 2018;Komor et al. 2016;Nishida et al. 2016;Wang et al. 2017Wang et al. , 2018Wang et al. , 2020Wang et al. , 2021)).Genetic manipulation at the single-base level enables scientists to study gene function or correct disease-causing mutations, which holds tremendous value not only for basic research but also for disease treatment (Wang and Doudna 2023).
BEs contain two primary components: a programmable DNA-binding protein (locator), such as a catalytically impaired Cas nuclease, and a DNAmodifying enzyme (effector), such as nucleoside deaminases (Yang and Chen 2020).BEs can be classified as cytosine base editors (CBE) and adenine base editors (ABE) according to the nucleoside deaminases they contain (Huang et al. 2021a).Uridine (U) and thymidine (T) can be formed by the spontaneous hydrolytic deamination of cytidine (C) and 5-methylcytidine, respectively.In humans, C-to-U deamination can also be catalyzed by numerous cytidine deaminases, the best known of which belong to a family of mammalian enzymes called "activation-induced cytidine deaminase/apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like (AID/APOBEC) protein family" (Wedekind et al. 2003;Yang et al. 2017) (Figs. 1A and 2B).Adenosine (A), like cytidine, contains an exocyclic amino group, and deamination changes its pairing properties.Deamination of adenosine to inosine (A-to-I) in RNA can be catalyzed by the adenosine deaminase acting on RNA (ADAR) protein family (Savva et al. 2012) (Fig. 1B).In addition, other nucleoside deaminases from prokaryotic organisms were also discovered, e.g., transfer RNA (tRNA) adenosine deaminase (TadA) (Kim et al. 2006;Wolf et al. 2002) and double-strand DNAspecific deaminase toxin A (DddA) (Mok et al. 2020), the members of which have been utilized to develop gene-editing tools as well.Notably, DddA targets double-stranded DNA (dsDNA) instead of singlestranded DNA (ssDNA) (Mok et al. 2020;Salter and Smith 2018), enabling it to fulfill editing goals at places where other deaminases cannot reach (Kim and Chen 2023).More recently, AI-based protein structure prediction and clustering established a suite of ssDNA deaminases and dsDNA deaminases, further enriching the deaminases tool family (Huang et al. 2023).
Thus, nucleoside deaminase, the effector with base modifying activity, plays determinant roles in the efficiency, scope, accuracy, and specificity of base editing (Yang and Chen 2020).Extensive studies on the discovery, engineering, and evolution of nucleoside deaminases have significantly enlarged the BE toolkit (Barrera-Paez and Moraes 2022; Huang et al. 2021a;Yang et al. 2019).In this review, we summarize the cytidine and adenosine deaminases that have been widely applied in the base editing field and highlight the structural and functional features of the native enzyme and their engineered variants, which have led to the development of more efficient and precise BEs.

CYTIDINE DEAMINASE APOBEC/AID, ssDNA deaminases
Each member of the APOBEC family has specific physiological functions that involve the binding of nucleic acid and catalysis of cytidine to uridine deamination in the context of either RNA and/or ssDNA (Salter et al. 2016).In human cells, the APOBEC family consists of 11 genes, i.e., APOBEC1 (A1), AID, APOBEC2 (A2), APOBEC3A-H (A3A-H), and APOBEC4 (A4).These genes and their alternatively spliced variants can produce various protein products.These deaminases all contain at least one catalytic domain that comprises canonical zinc-dependent deaminase signature motif (H/C-X-E-X 23-28 -P-C-X 2-4 -C (HECC) (Pecori et al. 2022) (Fig. 2B).A number of structures of APOBECs have been reported without nucleic acid or nucleotide ligands bound, including that of the single cytidine deaminase domain (CDA) domain-containing APOBECs like hA1, hAID, hA3A, hA3C, and hA3H, as well as that of the C terminal catalytic domain of the dual CDA domain-containing APOBECs like hA3B, hA3F and hA3G (Salter and Smith 2018).These structures all adopt a typical CDA fold wherein a five-strand β-sheet (β1-β5) is surrounded by six α-helices (α1-α6).Of the loops (L1 to L10) that connect the β-stands and α-helices, loops 1, 3, 5 and 7 participate in the formation of the substrate binding groove wherein the catalytic zinc ion is coordinated by the His-Glu (H and E) and Cys-Cys (C) motifs on α2 and L5/α3, respectively (Fig. 2C, Zn gray sphere).Most A3 enzymes prefer a 5'-TC (or UC) dinucleotide sequence in ssDNA or single-stranded RNA substrates, except for A3G which prefers to deaminate cytidines in a 5'-CC motif (Table 1) (Pecori et al. 2022;Salter et al. 2016).
Since the introduction of the BE3 system, more BEs have been developed through optimizing or replacing the deaminase moiety.For instance, several cytidine deaminases other than rA1 including hAID (Hess et al. 2016;Ma et al. 2016), PmCDA1 (Nishida et al. 2016), hA3A (Gehrke et al. 2018;Wang et al. 2018), hA3B (Doman et al. 2020), hA3G-CDA2 (Liu et al. 2020) and mouse APOBEC3 (mA3) (Wang et al. 2021) have been put into the conventional BE3 architecture.As CpG methylation generally has a negative effect on the C-to-T editing efficiency by rA1-based BE3, Wang et al. replaced the rA1 moiety with hA3A and demonstrated that the hA3A-BE3 is the most efficient at methylated CpG sites among BEs that follow the conventional BE3 architecture (Wang et al. 2018).To narrow the editing window of hA3A-BE3, Wang et al. further introduced the Y130F or Y132D mutation into the deaminase, both of which are in loop 7 and predicted to interact directly with the nucleic acid substrate (Fig. 2D) (Wang et al. 2018).Besides, through AI-assisted structure prediction, Huang et al. identified some novel deaminases with disparate deamination motif preferences on ssDNA substrates, including Sdd3, Sdd6, and Sdd7, which would further expand the editing scope of base editors (Table 1).Though these Sdds share a core structure similar to DddA and theoretically do not belong to the APOBEC/AID family, they showed more robust cytosine base editing activity on ssDNA than some APOBEC/AID deaminases and could be used to develop base editing tools (Huang et al. 2023).
Meanwhile, BEs with novel architecture were also developed, of which the transformer base editor system (tBE) solved the off-target problem of conventional BEs through its ingenious design.Finding that the inactive CDA domain of mA3 functions as a deoxycytidine deaminase inhibitor (dCDI), Wang et al. took advantage of mA3dCDI to develop the tBE system.tBE remains inactive at off-target sites due to the fusion of a cleavable mA3dCDI, but would be transformed into deamination-competent form by cleaving off dCDI at ontarget sites, therefore eliminating OT mutations (Wang et al. 2021).
All in all, the APOBEC/AID family members and other ssDNA deaminases have been exploited to play critical roles in various base editors.

DddA
Compared to APOBEC/AID family, DddA has a unique ability to catalyze the direct deamination of cytidine in dsDNA.Its distinct dsDNA binding and deamination activities have also brought new opportunities for gene editing.Previously, the lack of feasible nucleic acid delivery systems across the mitochondrial doublemembranes hindered the application of CRISPR-based DNA editing tools within mitochondria.Therefore, the development of novel mitochondrial base editing tools to repair mutant mitochondrial genomes and manipulate mitochondrial gene expression becomes particularly challenging.
In 2020, a novel bacterial toxin, DddA, was identified in Burkholderia cenocepacia.Its C-terminal toxin domain (DddA tox ) acts as a dsDNA deaminase, deaminating cytosine in dsDNA to form uracil (Mok et al. 2020).To overcome the cytotoxicity of the fulllength toxin-like protein, DddA tox is further divided into two parts, DddA tox -N and DddA tox -C.Both parts are then fused with a mitochondrial targeting signal sequence  3A).Leveraging the targeting activity of TALE and the dsDNA deamination activity of DddA, efficient C-to-T editing of mtDNA target sites has been achieved by DdCBE (Mok et al. 2020).Furthermore, by fusing another effector, i.e., adenosine deaminase with DddA, a new mitochondrial genome editor TALED was developed to induce A-to-G editing (Fig. 3A) (Cho et al. 2022).
To reduce off-target editing of DdCBE, Lee et al. introduced T1391A and K1389A in the binding interface of DddA tox -N and DddA tox -C and constructed HiFi-DdCBE (Fig. 3C).HiFi-DdCBE largely maintains the activity of DdCBE while improving its specificity, thus achieving high efficiency and precision suitable for therapeutic applications (Lee et al. 2023).In addition, new DddA homologs have also been discovered and reported by different groups, providing diverse choices for constructing mitochondrial base editors.For example, a new DddA homolog named DddA_Ss was identified from Simiaoa sunii, which has been used to develop new DdCBE_Ss to enable editing at the 5'-GC context in dsDNA (Mi et al. 2023).Guo et al. also identified a novel DddA homolog from Roseburia intestinalis and referred to it as DddA_Ri.They successfully developed CRISPR-based nuclear genome cytosine base editors (crDdCBE) and TALE-based mitochondrial genome cytosine base editors (mitoCBE) with DddA_Ri, achieving efficient dsDNA editing in nuclear and mitochondrial genomes respectively.Compared to DddA11, DddA_Ri-derived mitoCBE completely overcomes the 5'-TC context limitation (Guo et al. 2023).Recently, Huang et al. used AlphaFold2's structural classification feature to identify many new DddA-like clades, which have substrate preference distinct from 5'-TC.Among them, the newly identified Ddd1 and Ddd9 exhibit higher activity at the 5'-GC motif (Huang et al. 2023).These newly discovered DddA proteins greatly enriched the mitochondrial base editing toolkits (Cheng et al. 2023; Kim and Chen 2023) (Fig. 3D).These studies also highlight the existence of dsDNA deaminases with diverse editing characteristics, which deserve to be further explored and applied.

TadA-WT and TadA variant
Inspired by CBE, another type of genome editing technology capable of altering A to G was developed.However, native adenosine deaminase can only deaminate free adenosine, the adenosine in singlestranded RNA or the adenosine in the RNA of mismatched RNA-DNA heteroduplexes, but has no activity on adenosine in dsDNA or ssDNA (Zheng et al. 2017).To establish an ABE, Escherichia coli TadA was selected for seven rounds of directed evolution in vitro to obtain TadA* (hereafter referred to as TadA-7.10)that exhibits adenosine deamination activity in ssDNA.The authors then artificially constructed a TadA-WT: TadA-7.10 heterodimeric adenosine deaminase, which, when fused with nCas9 (D10A), generated ABE7.10.ABE7.10 exhibited efficient A-to-G editing within its editing window (A4-A7) in mammalian cells (Gaudelli et al. 2017) (Fig. 4A).Afterward, ABE7.10 was further evolved to generate ABE8e, which carries a single TadA domain (TadA-8e) and deaminates DNA at a higher rate than ABE7.10 (Richter et al. 2020).Notably, the TadA-8e moiety carries 8 additional mutations as compared to TadA-7.10 (Fig. 4C and 4D, Table 2).
The TadA-WT enzyme physiologically deaminates adenine 34 (A34) in E. coli transfer RNA (tRNA) Arg2 , and the TadA variants used in ABEs had not completely lost their RNA editing activity (Kim et al. 2006;Rees et al. 2019).Therefore, to reduce the RNA off-targeting of ABEs, some critical mutations were introduced to TadA, including K20A/R21A and V82G in two versions of miniABEmax (Grunewald et al. 2019b) (Table 2), F148A mutation (Zhou et al. 2019) and V106W mutation (Rees et al. 2019) in the deaminase domain of TadA-7.10.The residues K20 and R21 are solvent exposed in the α1-helix and their substitution to TadA-8e with no ssDNA bound T T T T TadA_WT α4 β4 β5 α5 T T T T TadA_8e Fig. 4 Structural analysis of TadA and TadA variants in ABE.A Schematic illustrating the design of TadA-derived adenine base editor (ABE).Evolved TadA variants can deaminate adenosines in ssDNA to yield inosines, which are read as guanosines by DNA polymerase.B Cartoon topology of a TadA-WT (PDBID: 1Z3A) shows that the core CDA fold is composed of a five-strand β-sheet (β1-β5) and five αhelices (α1-α5).C Cartoon representations of TadA-8e in complex with NTS DNA (partial sequence shown).The evolved residues are shown as sticks and colored purple (TadA-7.10)or red (TadA-8e) (PDBID: 6VPC).The side view (left) with ssDNA and the top view without ssDNA substrate (right) are both shown.D Sequence alignment of the TadA and TadA variants (TadA-7.10,TadA-8e and TadA-9).The secondary structure elements (α-helices and β-strands) of the TadA-WT (PDBID:1Z3A) and TadA-8e (PDBID:6VPC) are shown above the alignment.The mutations introduced during the directed evolution of ABE7.10,ABE8e and ABE9 are labeled in purple, red, and brown triangles respectively alanine changes the positive charge of this surface.This probably reduces the interaction between the deaminase domain and RNA, and thus, reduces RNA deamination in cells.Since V82 and V106 both located at the bottom of the catalytic pocket, their substitution likely reshapes the active center, resulting in decreased activity on RNA (Fig. 4C) (Lapinaite et al. 2020).
Chen et al. employed a structure-based mutagenesis strategy and identified the L145T mutation to develop ABE9.In comparison to ABE8e, ABE9 has a narrower editing window (Chen et al. 2023).The mutations in ABE8e and ABE9 are primarily located in the active site loops and the C-terminal α-helix, similar to the mutations in ABE7.10 (Fig. 4D, Table 2).Unlike the plethora of ssDNA-editing cytidine deaminases in the APOBEC/AID family and other clades, the lack of adenine deaminase that naturally acts on ssDNA will bring more challenges to the further development of ABE.

ADAR family
Two different enzymes carry out A-to-I editing in humans: ADAR1 and ADAR2 (Bass et al. 1997).ADARs share a common domain architecture, consisting of a variable number of amino-terminal dsRNA binding domains (dsRBDs) and a carboxy-terminal catalytic deaminase domain (Goodman et al. 2012).Similar to TadA, ADAR targets adenosines in double-stranded RNA (dsRNA) and deaminates them into inosines, which are biochemically interpreted as guanosines, thereby introducing functional A-to-G mutations into RNA (Bass and Weintraub 1988;Savva et al. 2012).
Cox et al. fused the adenine deaminase domain of ADAR2 (ADAR2 DD ) with the catalytically inactive Cas13 protein, which enabled A-to-I RNA editing at the transcriptome level, and named the editor as RNA Editing for Programmable A to I Replacement (REPAIR) (Cox et al. 2017) (Fig. 5A).Furthermore, in order to minimize the substantial off-target RNA editing associated with the first version of REPAIR (REPAIRv1), they introduced E488Q/T375G into ADAR2 DD and developed REPAIRv2, which can induce specific and efficient A-to-I base editing in RNA (Cox et al. 2017) (Fig. 5B, Table 3).
For C-to-U RNA base editing, Abudayyeh et al. focused on the ADAR2 DD residues contacting RNA substrates and performed three rounds of rational mutations on ADAR2 DD fused with a catalytically inactive Cas13b homolog (Abudayyeh et al. 2019) (Fig. 5B).This effort resulted in RESCUEr3 (RESCUE round 3), which exhibited improved C-to-U editing activity.Building upon this, they initiated directed evolution within ADAR2 DD to identify additional candidate mutations that would enhance RESCUE activity in yeast.After 16 rounds of evolution, they ultimately obtained the final construct, RESCUEr16 (hereafter referred to as RESCUE), which manifested significantly increased C-to-U deamination activity at all tested targets in the context of any flanking 5' and 3' bases while retaining Ato-I editing activity (Abudayyeh et al. 2019).The major mutations of RESCUE are shown in the structure of ADAR2 DD with no duplex RNA bound as a green stick model (Fig. 5B).Mutations introduced to the catalytic core (V351G and K350I) and to the regions contacting the RNA target (S486A, S495N) are both essential to RESCUE activity (Table 3).Huang et al. fused human A3A (hA3A) with dPspCas13b to create a distinct C-to-U    (Qu et al. 2019).LEAPER exhibits high editing specificity with rare off-target mutations and limited bystander editing.Reautschnig et al. optimized the design of gRNAs by combining the target site of the gRNA with a cluster of recruiting sequences (RS) freely distributed across the target RNA and named these gRNAs CLUSTER gRNAs (Reautschnig et al. 2022).This CLUSTER design resulted in gRNAs with high sequence flexibility and enabled efficient RNA A-to-I editing both in cultured cells and in vivo with significantly reduced bystander editing.Recently, Katrekar et al. and Yi et al. employed covalently closed circular arRNAs, named cadRNAs and LEAPER 2.0, to further enhance editing efficiency and reduce bystander editing (Katrekar et al. 2022;Yi et al. 2022).Though DNA editing in the genome can potentially provide long-lasting and even permanent cures, it comes with the potential risks of long-lasting off-target effects.In contrast, RNA editing offers tunability and reversibility as it does not cause permanent changes in the genome.Therefore, RNA editing has unique advantages in certain therapeutic contexts.

PERSPECTIVE
Since the first series of BEs were developed in 2016, BEs have become revolutionary gene editing tools (Gaudelli et al. 2017;Komor et al. 2016).Great efforts have been made to develop modular BEs with high precision and efficiency (Huang et al. 2021a;Kim and Chen 2023;Yang et al. 2019;Yang and Chen 2020).In this review, we have described different types of nucleoside deaminases that are the key effectors of these revolutionary gene editing tools, with an emphasis on their structures and functionality.The high-resolution structure of the base editor machine detailed the interfaces between deaminase and their nucleotide substrates, which provide the blueprint for subsequent rational design and engineering of deaminases to improve the efficiency of corresponding base editors and reduce off-target risks.Besides structure-guiding rational design, directed evolution of deaminase is also a classic route and has achieved great success, such as TadA-8e and ADAR2 in RESCUE, which shall remain an important path for deaminase optimization in the future.More recently, AI-assisted structural classification has also been successfully applied to discover novel ssDNA and dsDNA cytidine deaminases, suggesting that AI-based discovery of new tool enzymes is also a novel and effective method, besides directed evolution and structure-guided rational engineering of known proteins.
Though current BEs have realized targeted editing of nucleic acid substrates in various contexts, new BEs are still needed to achieve specific and unlimited base editing at all desired sites.Such new BEs can be generated by discovering new nucleoside deaminase families and engineering the vast pool of nucleoside deaminases.We envision that these new BEs will be broadly used in biotechnology, basic research, and translational medicine in the future.
the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Fig. 1
Fig. 1 Cytidine and adenosine deamination processes.A Cytidine deamination generates uridine, which is read as thymidine by DNA polymerase.C, cytidine; U, uridine; T, thymidine.R represents 2´-deoxyribose in DNA or ribose in RNA.B Adenosine deamination generates inosine, which is read as guanosine by DNA polymerase.A, adenosine; I, inosine; G, guanosine.R represents 2´-deoxyribose in DNA or ribose in RNA

Fig. 2
Fig. 2 The conserved core cytidine deaminase domain of AID/APOBEC family.A Schematic illustration of AID/APOBEC-derived CBE.B Schematic of the AID/APOBEC family.Each member of the family contains the core catalytically active zinc-dependent cytidine deaminase domain (CDA), star labeled.C Cartoon topology of hA3A (PDBID: 5KEG) illustrating the typical core CDA fold shared by the AID/APOBECs family.The CDA fold is composed of a five-strand β-sheet (β1-β5) surrounded by six α-helices (α1-α6).D Cartoon representations of rA1 (Uniprot: P38483, generated from Alphafold2), hAID (PDBID: 5W0U), mA3-CDA1 (Uniprot: Q99J72, generated from Alphafold2), hA3A (PDBID: 5KEG), and hA3G-CDA2 (PDBID: 6BUX) structures.Target dC located at the bottom of catalytic pocket was showed as ball and stick models and colored in orange.Zn ion is depicted as a grey sphere.Positions of the engineered residues in optimized CBEs were highlighted with green sticks.Red dash circles indicate the catalytically active pocket

Fig. 3
Fig. 3Deaminase toxin A (DddA) and its characteristics in genomic DNA and mitochondrial DNA editing.A Schematic illustration of the design of DddA-derived cytosine base editor (DdCBE) and TALE-linked Deaminase (TALED).B Cartoon topology of a DddA (PDBID: 8E5E) shows that the conserved core CDA fold is composed of a five-strand β-sheet (β1-β5) and two α-helices (α1 and α2).C Cartoon representations of DddA in complexed dsDNA substrate (PDBID: 8E5E).Target dC located at the bottom of the catalytic pocket is shown as ball and stick models and colored in orange.Zn ion is depicted as a grey sphere.Positions of engineered residues in DddA6, DddA11 and HiFi-DddA are highlighted with sticks and colored in salmon, blue, and green respectively.The side view (left) with dsDNA and the top view without dsDNA (right) were both shown.D Substrate preferences of DddA, its engineered variants and newly discovered homologs, which were all used to develop more advanced mitochondrial BEs

Fig. 5
Fig. 5 Adenine base editing in RNA.A Schematic of RNA editing by dCas13b-ADAR2 DD fusion proteins (REPAIR) or dCas13b-ADAR2 DD variants fusion proteins (RESCUE).B Structure of ADAR2 DD E488Q bound to the duplex RNA (PDBID:5ED1).Positions of evolved key residues in the RESCUE system are shown as green sticks.The side view with ssDNA (left) and the top view without ssDNA substrate (right) are both shown

Table 1
Types and characteristics of cytidine deaminases in CBEs

Table 2
Types and characteristics of TadA variants in ABEs (Huang et al. 2021b)led C-to-U RNA Editor (CURE), which incorporates an editing-enhancing mutation Y132D(Huang et al. 2021b).Unlike RESCUE, CURE is designed exclusively for C-to-U editing and does not

Table 3
Types and characteristics of ADAR deaminases used in RNA editing